Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wendy Lehnert

University of Massachusetts

CRYSTAL: Inducing a Conceptual Dictionary

May 09, 1995

Stephen Soderland, David Fisher, Jonathan Aseltine, Wendy Lehnert

Figure 1 for CRYSTAL: Inducing a Conceptual Dictionary

Figure 2 for CRYSTAL: Inducing a Conceptual Dictionary

Figure 3 for CRYSTAL: Inducing a Conceptual Dictionary

Figure 4 for CRYSTAL: Inducing a Conceptual Dictionary

Abstract:One of the central knowledge sources of an information extraction system is a dictionary of linguistic patterns that can be used to identify the conceptual content of a text. This paper describes CRYSTAL, a system which automatically induces a dictionary of "concept-node definitions" sufficient to identify relevant information from a training corpus. Each of these concept-node definitions is generalized as far as possible without producing errors, so that a minimum number of dictionary entries cover the positive training instances. Because it tests the accuracy of each proposed definition, CRYSTAL can often surpass human intuitions in creating reliable extraction rules.

* 6 pages, Postscript, IJCAI-95 http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.html

Via

Access Paper or Ask Questions

Corpus-Driven Knowledge Acquisition for Discourse Analysis

Jun 07, 1994

Stephen Soderland, Wendy Lehnert

Figure 1 for Corpus-Driven Knowledge Acquisition for Discourse Analysis

Figure 2 for Corpus-Driven Knowledge Acquisition for Discourse Analysis

Figure 3 for Corpus-Driven Knowledge Acquisition for Discourse Analysis

Figure 4 for Corpus-Driven Knowledge Acquisition for Discourse Analysis

Abstract:The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher levels as well. In this paper we will show how ML techniques can be used to support knowledge acquisition for information extraction systems. It is often very difficult to specify an explicit domain model for many information extraction applications, and it is always labor intensive to implement hand-coded heuristics for each new domain. We have discovered that it is nevertheless possible to use ML algorithms in order to capture knowledge that is only implicitly present in a representative text corpus. Our work addresses issues traditionally associated with discourse analysis and intersentential inference generation, and demonstrates the utility of ML algorithms at this higher level of language analysis. The benefits of our work address the portability and scalability of information extraction (IE) technologies. When hand-coded heuristics are used to manage discourse analysis in an information extraction system, months of programming effort are easily needed to port a successful IE system to a new domain. We will show how ML algorithms can reduce this

* 6 pages, AAAI-94

Via

Access Paper or Ask Questions